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ABSTRACT 


Seismic  monitoring  for  underground  nuclear  explosions  answers  three  questions  for  all  global  seismic  activity: 
Where  is  the  seismic  event  located?  What  is  the  event  source  type  (event  identification)?  If  the  event  is  an  explosion, 
what  is  the  yield?  The  answers  to  these  questions  involve  processing  seismometer  waveforms  with  propagation 
paths  predominately  in  the  mantle.  Four  discriminants  commonly  used  to  identify  teleseismic  events  are  depth  from 
travel  time,  presence  of  long-period  surface  energy  (mb  versus  Ms),  depth  from  reflective  phases,  and  polarity  of 
first  motion.  The  seismic  theory  for  these  discriminants  is  well  established  in  the  literature  (see  for  example 
Pomeroy  et  al.  [1982]  and  Blandford  [1982]).  However  the  physical  basis  of  each  has  not  been  formally  integrated 
into  probability  models  to  account  for  statistical  error  and  provide  discriminant  calculations  generally  appropriate 
for  multi-dimensional  event  identification.  This  paper  develops  a  mathematical  statistics  formulation  of  these 
discriminants  and  offers  a  novel  approach  to  multi-dimensional  discrimination  that  is  readily  extensible  to  other 
discriminants.  For  each  discriminant  a  probability  model  is  formulated  under  a  general  null  hypothesis  of  HO: 
Explosion  Characteristics.  The  veracity  of  the  hypothesized  model  is  measured  with  a  p-value  calculation 
(see  Stuart  et  al.  [1994]  and  Freedman  et  al.  [1991])  that  is  filtered  to  be  approximately  normally  distributed  and  is 
in  the  range  [0,  1].  A  value  near  zero  rejects  HO,  and  a  moderate  to  large  value  indicates  consistency  with  HO.  The 
hypothesis  test  formulation  ensures  that  seismic  phenomenology  is  tied  to  the  interpretation  of  the  p-value.  These 
p-values  are  then  embedded  into  a  multi-discriminant  algorithm  that  is  developed  from  regularized  discrimination 
methods  proposed  by  Smidt  and  McDonald  (1976),  DiPillo  (1976)  and  Friedman  (1989).  Performance  of  the 
methods  is  demonstrated  with  102  teleseismic  events  with  magnitudes  (mb)  ranging  from  5  to  6.5  in  Anderson  et  al. 
(2007).  Example  p-value  calculations  are  also  given  for  two  of  these  events.  Preliminary  studies  on  the  statistical 
properties  of  p-values  are  presented  here. 
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OBJECTIVES 

Anderson  et  al.  (2007)  propose  a  unifying  framework  for  seismic  event  identification  that  can  be  populated  with  a 
diversity  of  seismic  discriminants.  For  inclusion  in  the  framework,  a  discriminant’ s  physical  theory  must  be 
mathematically  embedded  into  a  probability  model  designed  to  capture  significant  sources  of  error.  This  is 
accomplished  by  formulating  each  discriminant  as  a  statistical  hypothesis  test  under  a  general  null  hypothesis  of  HO: 
Explosion  Characteristics.  For  example,  a  depth  null  hypothesis  under  Explosion  Characteristics  might  be  HO:  event 
depth  <10  km  with  the  logical  alternative  hypothesis  HA:  event  depth  >10  km.  The  veracity  of  the  null  hypothesis 
for  each  discriminant  is  measured  with  a  p-value  calculation,  which  is  used  as  the  discriminant.  The  p-value  ranges 
between  zero  and  one,  with  a  value  near  zero  indicating  inconsistency  with  Explosion  Characteristics  and  a 
moderate  to  large  value  indicating  consistency  with  Explosion  Characteristics.  With  this  approach  to  discriminant 
construction,  the  p-value  carries  information  about  source  type  fully  adjusted  for  natural  and  measurement 
variability.  This  places  a  high  standard  on  the  construction  of  the  discriminants — seismic  phenomenology  and  path 
corrections  must  be  integrated  into  an  appropriate  probability  model,  and  a  seismic-based  hypothesis  test  must  be 
constructed.  The  p-values  under  this  formulation  can  be  viewed  as  standardized  discriminants  with  common 
interpretation  across  geographical  regions  and  different  discriminants. 

For  continuous  discriminants  such  as  depth  from  travel  time,  spectral  ratios,  or  mb  versus  Ms,  when  the  null 
hypothesis  is  true  (e.g.,  explosion),  the  p-value  will  have  a  uniform  probability  distribution,  and  when  the  null 
hypothesis  is  false  (earthquake),  the  p-value  will  have  a  probability  distribution  with  most  of  its  mass  near  zero. 
Here,  the  concentration  of  probability  mass  at  zero  is  determined  by  the  degree  of  disagreement  between  the  true 
probability  model  of  the  data  and  the  null  hypothesis  model. 

The  hypothesis  test  p-values  can  be  mildly  transformed  to  become  standardized  discriminants  Y  that  also  possess 
predictable  statistical  properties.  They  also  range  between  zero  and  one,  their  interpretation  is  completely  analogous 
to  that  of  p-values,  and  they  are  approximately  Gaussian.  Therefore,  established  Gaussian  discrimination  methods 
can  be  used  to  formulate  a  unified  decision  from  standardized  discriminants.  Specifically,  the  equation 

Y  =  —  arc  sin (^ p  -  value ) 
n 

is  well  established  in  statistical  theory  as  a  transformation  to  achieve  Gaussian  behavior  in  data  bounded  between 
zero  and  one.  Figure  1  illustrates  the  effect  of  standardizing  the  hypothesis  test  p-values.  Precedence  for  interpreting 
p-values  as  discriminants  can  be  found  in  Maharaj  (2000),  and  Dumbgen  and  Homke  (2000). 
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Figure  1.  Transformation  to  induce  an  approximate  Gaussian  distribution  on  individual  p-values  (denoted  p 
in  the  graphs)  to  derive  standardized  discriminants.  The  HO  probability  distribution  is  gray,  and  the 
HA  probability  distribution  is  black. 

In  the  framework  proposed  in  Anderson  et  al.  (2007),  standardized  discriminants  Y  are  mathematically  combined  for 
source  identification.  This  is  accomplished  with  a  typicality  index  calculation  (see  McLachlan  [1992])  that  measures 
the  degree  of  agreement  a  suite  of  observed  discriminants  have  with  the  earthquake  and  explosion  populations. 

With  the  typicality  index  calculation,  an  event  can  be  declared 

•  consistent  with  historical  explosions, 

•  not  consistent  with  historical  earthquakes, 

•  consistent  with  explosions  and  earthquakes  (indeterminate),  or 

•  not  consistent  with  either  earthquakes  or  explosions  (unidentified). 

These  declarations  are  technically  defensible. 
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In  the  framework,  a  second  source  identification  calculation  is  made  that  assumes  only  two  possible  decisions  - 
earthquake  or  explosion  (indeterminate  and  unidentified  are  not  possible).  Bayesian  statistical  methods  can  be  used 
to  calculate  the  P(earthquake  |  event  data)  and  P(explosion  |  event  data).  Note  again  that  these  probabilities  sum  to 
one.  One  possible  rule  is  to  simply  take  the  higher  of  the  two  probabilities  as  the  source  identification.  The  objective 
of  this  research  was  to  determine  some  of  the  statistical  properties  Bayesian  source  identification  calculations  with 
simulated  teleseismic  discriminants  X  versus  the  corresponding  standardized  discriminants  Y. 


RESEARCH  ACCOMPLISHED 


A  simulation  was  performed  that  emulates  the  first-order  properties  of  teleseismic  discriminants  for  earthquakes  and 
explosions.  The  simulation  represents  the  potential  source  populations  for  two  discriminants  e.g.,  mb  versus  Ms 
(denoted  Xj),  and  a  teleseismic  spectral  ratio  discriminant  (denoted  X2)  (see  Taylor  and  Marshall  [1991]).  A  subset 
of  the  simulation  is  reported  here.  The  explosion  model  used  to  simulate  A/  and  X2  had  the  centroid  (0,0)  and  a 
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The  earthquake  model  had  the  centroid  (-2,-2)  with  a  suite  of  covariance 
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First,  using  one  of  the  earthquake/explosion  model  combinations  above  (the  true  explosion/earthquake  models),  30 
explosions  and  300  earthquakes  were  simulated  emulating  the  acquisition  of  teleseismic  discriminant  calibration 
data  ( Xj  and  X2).  These  calibration  data  were  used  to  calculate  the  centroids  (calibrated  centroids)  and  a  pooled 
covariance  (calibrated  covariance)  for  the  explosion  and  earthquake  models  for  data  Xj  and  X2.  The  simulated 
calibration  data  were  then  converted  to  p- values  from  a  Z- score.  The  Z- scores  for  the  Xj  data  are  gotten  by 
subtracting  the  calibrated  explosion  mean  from  the  X1  data  and  dividing  by  the  standard  deviation  of  X;  from  the 
calibrated  covariance  matrix.  The  p-value  calculation  for  each  data  point  Xj  is  then  the  left  tail  probability  of  the 
standard  Gaussian  distribution  (see  Figure  2).  Calculations  are  analogous  for  the  X2  data.  With  the  p-value 
calculations,  the  explosion  data  for  both  Xj  and  X2  will  have  a  histogram  that  is  uniformly  distributed  and  the 
earthquake  data  will  have  a  distribution  that  is  tightly  packed  near  zero  (see  the  left  graphic  in  Figure  1).  The 
calibration  p-values  are  then  transformed  to  standardized  discriminants  Yj  and  Y2.  These  data  are  then  used  to 
calculate  the  centroids  (calibrated  centroids)  and  a  pooled  covariance  (calibrated  covariance)  for  the  explosion  and 
earthquake  models  for  data  Yj  and  Y2.  On  completion  of  the  calibration  step,  we  have  the  models  for  source  event 
identification  with  either  teleseismic  discriminants  Xj  and  X2  or  standardized  discriminants  Yj  and  Y2. 


Figure  2.  Graphical  representation  of  p-value  calculations  from  Z-scores  for  Xj  and  X2. 

In  the  next  step,  5000  explosions  and  5000  earthquakes  are  simulated  using  the  true  earthquake/explosion  models. 
These  data  emulate  new  events  with  associated  teleseismic  discriminants  (Xj  and  X2).  The  standardized 
discriminants  Yj  and  Y2  are  calculated  from  these  data  using  the  calibrated  explosion  means  and  the  standard 
deviations  from  the  calibrated  covariance  matrix  -  the  calculations  are  exactly  as  those  made  with  the  calibration 
data  step.  It  is  these  5000  simulated  explosions  and  5000  simulated  earthquakes  that  are  used  to  compare  the 
properties  of  X1  and  X2  versus  Yj  and  Y2  in  the  Bayesian  source  identification  calculation.  For  this  simulation  study, 
the  larger  of  P(earthquake  |  event  data)  and  P(explosion  |  event  data)  is  taken  as  the  source  identification  with  both 
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the  teleseismic  discriminant  dataX/  andX2,  and  the  standardized  discriminants.  With  these  simulated  event  data  the 
probability  of  correctly  identify  an  explosion  (PD  [=]  probability  of  detection)  and  the  probability  of  a  false-alarm 
(FA  [=]  false-alarm  probability)  can  be  calculated,  that  is,  the  number  of  times  an  explosion  is  correctly  identified 
divided  by  5000  and  the  number  of  times  an  explosion  is  identified  as  an  earthquake  divided  by  5000.  To  compare 
the  properties  of  Bayesian  source  identification  with  the  two  discriminants,  we  use  the  ratio  FA/PD.  In  other  words, 
this  ratio  is  the  false-alarms  per  detection,  and  the  smaller  this  ratio  the  better.  The  ratios  FA/PD  are  reported  in 
Table  1  for  Bayesian  source  identification  with  simulated  teleseismic  discriminants  and  standardized  discriminants. 
The  true  models  used  in  the  simulation  are  graphically  represented  in  the  top  row.  In  all  cases,  using  the 
standardized  discriminants  gives  better  false-alarm  performance  relative  to  explosion  identification  probability.  This 
is  consistent  with  the  results  observed  in  the  full  simulation. 


Table  1.  False-alarms  per  detection  for  simulated  teleseismic  discriminants  and  standardized  discriminants. 
True  models  used  to  simulate  teleseismic  discriminants  X2  andX2  are  presented  graphically  in  the 
first  row.  Ellipses  for  the  models  are  95%  probability  regions. 


CONCLUSIONS  AND  RECOMMENDATIONS 


Conclusions  are  preliminary  -  theorems  are  not  proven  with  simulations.  However,  we  can  conclude,  based  on 
simulations  of  statistical  population  behavior  typical  of  some  teleseismic  discriminants,  that  standardized 
discriminants  give  improved  operational  performance  over  teleseismic  discriminants,  as  measured  by  false-alarms 
per  detection  (FA/PD).  Using  p- values  as  discriminants  has  the  advantage  of  unifying  physical  and  statistical 
corrections  into  a  single  measurement.  Therefore,  in  principle,  p-values  represent  pure  information  about  a  seismic 
event  source  type.  This  is  a  compelling  reason  for  using  p-values  as  discriminants  on  its  merits.  The  preliminary 
performance  properties  of  p-values  further  supports  p-values  as  seismic  discriminants.  Further  research  includes 
comprehensive  simulations  and  potentially  the  development  of  mathematical  arguments  (theorems)  that  generalize 
this  property  of  standardized  discriminants. 

REFERENCES 

Anderson,  D.  N.,  D.  K.  Fagan,  M.  A.  Tinker,  G.  D.  Kraft  and  K.  D.  Hutchenson  (2007).  A  Mathematical  Statistics 
Formulation  of  the  Teleseismic  Explosion  Identification  Problem  with  Multiple  Discriminants,  to  appear  in 
Bull  Seism.  Soc.  Am. 

Blandford,  R.  (1982).  Seismic  event  discrimination.  Bull.  Seism.  Soc.  Am.  72:  69-87.  DiPillo,  P.  (1976).  The 
application  of  bias  to  discriminant  analysis.  Communications  in  Statistics  (Theory  and  Methods)  5: 
843-854. 

Dumbgen,  L.  and  L.  Homke  (2000).  P-values  for  discriminant  analysis.  Preprint  series  A-00-14,  Schriftenreihe  der 
Institute  fuer  Informatik/Mathematik,  University  of  Lubeck,  Germany. 

Freedman,  R.,  R.  Pisani,  R.  Purves,  and  A.  Adhikari  (1991).  Statistics ,  Second  Edition,  W.W.  Norton  &  Company 
Inc.,  New  York. 


529 


29th  Monitoring  Research  Review:  Ground-Based  Nuclear  Explosion  Monitoring  Technologies 


Friedman,  J.  (1989).  Regularized  discriminant  analysis.  Journal  of  the  American  Statistical  Association , 

84:165-175. 

Maharaj,  E.  (2000).  Clusters  of  time  series.  Journal  of  Classification ,  17:297-314. 

McLachlan,  G.J.  (1992).  Discriminant  Analysis  and  Statistical  Pattern  Recognition.  John  Wiley  &  Sons,  New  York. 

Pomeroy,  P.W.,  W.J.  Best,  and  T.V.  McEvilly  (1982).  Test  ban  treaty  verification  with  regional  data-A  review. 

Bull.  Seism.  Soc.  Am.  72:  S89-S129. 

Smidt,  R.  and  L.  McDonald  (1976).  Ridge  discriminant  analysis.  Technical  Report  108,  University  of  Wyoming, 
Department  of  Statistics,  Laramie,  Wyoming. 

Stuart,  A.,  K.  Ord  and  S.  Arnold  (1994).  Kendall's  Advanced  Theory  of  Statistics:  Volume  2 A,  Classical  Inference 
and  the  Linear  Model ,  6th  Edition  Arnold  Publishers,  London. 

Taylor,  S.  and  P.  Marshall  (1991).  Spectral  discrimination  between  Soviet  explosions  and  earthquakes  using  short- 
period  array  data.  Geophys.  J.  Int.  106:265-273. 


530 


